{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Issue 11" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from persist.archive import Archive\n", "x = [1, 2, 3]\n", "y = [x, x]\n", "a = Archive(scoped=False)\n", "a.insert(y=y)\n", "s = str(a)\n", "d = {}\n", "exec(s, d)\n", "y_ = d['y']\n", "assert y[0] is y[1]\n", "assert y_[0] is y_[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After a bit of inspection, I found that the problem occurs at the reduction state. The call to `Graph.reduce()` is a bit too eager. Need to find out why. (Found a few bugs in the constructors for `Graph` and `Graph_` which should be similar...)\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "image/png": { "width": "20%" } }, "output_type": "display_data" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": { "image/png": { "width": "20%" } }, "output_type": "display_data" } ], "source": [ "from IPython.display import Image, display\n", "import os\n", "import sys\n", "import trace\n", "\n", "from persist.archive import Archive\n", "x = [1, 2, 3]\n", "y = [x, x]\n", "\n", "from pycallgraph2 import PyCallGraph\n", "from pycallgraph2.output import GraphvizOutput\n", "\n", "images = []\n", "\n", "a = Archive(scoped=True)\n", "a.insert(y=y)\n", "with PyCallGraph(output=GraphvizOutput()):\n", " s = str(a)\n", "images.append(Image(filename='pycallgraph.png', width=\"20%\"))\n", "os.remove('pycallgraph.png')\n", "\n", "a = Archive(scoped=False)\n", "a.insert(y=y)\n", "with PyCallGraph(output=GraphvizOutput()):\n", " s = str(a)\n", "images.append(Image(filename='pycallgraph.png', width=\"20%\")) \n", "os.remove('pycallgraph.png')\n", "\n", "display(*images)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "'Archive' object has no attribute '_graph'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mg\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_graph\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnodes\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAttributeError\u001b[0m: 'Archive' object has no attribute '_graph'" ] } ], "source": [ "g = a._graph\n", "g.nodes" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# create a Trace object, telling it what to ignore, and whether to\n", "# do tracing or line-counting or both.\n", "tracer = trace.Trace(\n", " ignoredirs=[sys.prefix, sys.exec_prefix],\n", " trace=1,\n", " count=1)\n", "\n", "# run the new command using the given tracer\n", "tracer.run('str(a)')\n", "\n", "# make a report, placing output in the current directory\n", "r = tracer.results()\n", "#r.write_results(show_missing=True, coverdir=\".\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The question here is who is responsible for ensuring that objects are not duplicated?\n", "* The method `get_persistent_rep_list()` uses two names... so not here (but still has the same object).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Import Issues" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The objective is to prevent the data from being loaded until `import ds.x` is called. This allows multiple processes to work with data independently with a lock file being required only when changing the metadata in `_info_dict`. To implement this we use the following trick suggested by [Alex Martelli](https://en.wikipedia.org/wiki/Alex_Martelli):\n", "\n", " * [Can modules have properties the same way that objects can?](https://stackoverflow.com/a/880550/1088938)\n", "\n", "Such a module might look like this:\n", " \n", "```python\n", "import sys\n", "import numpy as np\n", "sys.modules[__name__] = np.array([0, 1, 2, 3])\n", "```\n", " \n", "or like this (if loaded from disk):\n", " \n", "```python\n", "import os.path\n", "import sys\n", "import numpy as np\n", "datafile = os.path.splitext(__file__)[0] + \"_data.npy\"\n", "sys.modules[__name__] = np.load(datafile)\n", "```\n", "\n", "This seems to work very nicely. The imported array appears as part of the top-level module no matter how it is imported, and is only loaded when explicitly requested." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Byte Compiling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is very subtle issue with using the import mechanism for DataSets. When updating an attribute of a DataSet, we change the corresponding `.py` file on disk. However, if this change is made too quickly after importing the attribute, it is possible that the byte-compiled `.pyc` file might not be finished compiling until *after* the `.py` file is updated. In this case, the `.py` file will have an earlier timestamp than the `.pyc` file and so python will incorrectly assume that the `.pyc` file is authoritative.\n", "\n", "I do not see a good solution yet, so for now we use [`sys.dont_write_bytecode`](https://docs.python.org/2/library/sys.html?highlight=bytecode#sys.dont_write_bytecode):\n", "\n", "* https://stackoverflow.com/a/154617" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reloading" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Reloading data can also be an issue when replacing modules. For example, if we have a module `mod` and array `d` that is replaced, then what should `reload(mod)` do if `d` is updated on disk.\n", "\n", "Our solution is the following:\n", "\n", "* Delete all attributes like `mod.d` upon reload so that the user needs to re-import `mod.d` etc. This is done in the `__init__.py` file.\n", "* This behaviour is inconstant with the normal python import machinery, so we only do it for objects that were specified with `single_item_mode`. The reason we do it here is that the user cannot `reload(mod.d)` since this is an array. If the user does not want this behaviour, then they can disable `single_item_mode`.\n", "* The alternative behaviour might be to reload all data that has already been imported. We might provide a flag for this later if requested." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [conda env:work]", "language": "python", "name": "conda-env-work-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" }, "toc": { "base_numbering": 1, "nav_menu": { "height": "30px", "width": "252px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": "block", "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }